Mapping the prevalence of cancer risk factors at the small area level in Australia

Background Cancer is a significant health issue globally and it is well known that cancer risk varies geographically. However in many countries there are no small area-level data on cancer risk factors with high resolution and complete reach, which hinders the development of targeted prevention strategies. Methods Using Australia as a case study, the 2017–2018 National Health Survey was used to generate prevalence estimates for 2221 small areas across Australia for eight cancer risk factor measures covering smoking, alcohol, physical activity, diet and weight. Utilising a recently developed Bayesian two-stage small area estimation methodology, the model incorporated survey-only covariates, spatial smoothing and hierarchical modelling techniques, along with a vast array of small area-level auxiliary data, including census, remoteness, and socioeconomic data. The models borrowed strength from previously published cancer risk estimates provided by the Social Health Atlases of Australia. Estimates were internally and externally validated. Results We illustrated that in 2017–2018 health behaviours across Australia exhibited more spatial disparities than previously realised by improving the reach and resolution of formerly published cancer risk factors. The derived estimates revealed higher prevalence of unhealthy behaviours in more remote areas, and areas of lower socioeconomic status; a trend that aligned well with previous work. Conclusions Our study addresses the gaps in small area level cancer risk factor estimates in Australia. The new estimates provide improved spatial resolution and reach and will enable more targeted cancer prevention strategies at the small area level. Furthermore, by including the results in the next release of the Australian Cancer Atlas, which currently provides small area level estimates of cancer incidence and relative survival, this work will help to provide a more comprehensive picture of cancer in Australia by supporting policy makers, researchers, and the general public in understanding the spatial distribution of cancer risk factors. The methodology applied in this work is generalisable to other small area estimation applications and has been shown to perform well when the survey data are sparse. Supplementary Information The online version contains supplementary material available at 10.1186/s12942-023-00352-5.


Background
In 2020, an estimated 19.3 million people were diagnosed with cancer worldwide [1], causing a huge health burden. Reducing cancer incidence can be achieved by altering particular health behaviours, often termed modifiable cancer risk factors. Whiteman et al. [2] suggest that at least one in every three cancers in Australia can be attributed to modifiable risk factors such as tobacco smoking, obesity, poor diet, insufficient physical activity, excessive sun exposure and alcohol consumption. There is a strong individual level association between these modifiable behaviours and cancer incidence [3,2,4,5].
To better understand how cancer risk factors vary by location and target interventions, many countries have generated small area estimates for their prevalence including Australia [6], the US [7], Canada [8], Iran [9], and Luxembourg [10]. When generating small area estimates, practitioners must consider the reach and resolution of their results. Reach refers to the proportion of the small areas for which estimates are available, while resolution pertains to the small area population and geographical sizes. While the need for high resolution relates to minimizing outcome heterogeneity in larger areas and populations, the need for complete coverage (or high reach) ensures policy makers have complete spatial information. If small area estimates suffer from low reach or resolution the effectiveness of targeted interventions may be impacted.
In Australia, the Social Health Atlases of Australia (SHAA) [6] is the major platform providing nationwide estimates for cancer risk factors at a small area level. The estimates were derived from the 2017-2018 National Health Survey (NHS). However the reach and resolution of the SHAA estimates could be improved. The larger areal units used in the SHAA combine heterogeneous sub-populations, resulting in estimates that are averages over different populations. The limitation regarding reach meant that no estimates are provided for very remote areas. Given that health disparities tend to widen with increasing remoteness [11,12,13], generating estimates for these areas is important for targeted public health initiatives in Australia. The modelled estimates provided by the SHAA use the best data source available, so the problem cannot be solved by using a different dataset or collecting better data; the solution is to use new methods of small area estimation (SAE) [14].
SAE is a well-established survey method that leverages auxiliary data, such as census data, to estimate parameters of interest for small geographic areas with limited or no survey data. Model-based SAE methods, which borrow strength across areas [15], are particularly suitable for sparse survey data. These methods can be applied at either the area [16] or individual level [17], with the latter requiring access to survey and census microdata.
Proportion area-level models are commonly used [18,19,20,21]; however, they become unsuitable when some of the input data (area-level proportion estimates) are unstable, i.e. exactly zero or one [22]. Sparse survey data and modelling rare or common population characteristics exacerbate this instability [23]. Solutions to instability include perturbing direct estimates prior to modelling [24] or excluding unstable areas [19]. Alternatively, modelling at the individual level, such as through multilevel regression and poststratification (MrP) [25], can be pursued. However, the use of individual level SAE models to derive proportion estimates are limited by the need for census microdata [26], which restricts the choice of covariates. Note that the modelling for the SHAA was conducted by the Australian Bureau of Statistics (ABS). Unfortunately, given that the published details of the ABS approach are modest [27], we can only infer the use of a individual level model.
While individual and area level models have limitations, recent work supports the utility of two-stage SAE approaches, which involve separate modelling at both levels [23,28,29,30]. Two-stage approaches have many benefits that are particularly relevant for this application as they can alleviate unstable direct estimates by smoothing individual level outcomes, accommodate even severely sparse survey data thanks to multi-stage smoothing, and utilize more auxiliary data (e.g. survey-only covariates), permitting more flexible models and better predictions.
In this work, we generate small area level prevalence estimates for eight cancer risk factor measures using the Bayesian two-stage small area estimation methodology we developed for sparse survey data [23]. Our method considers a variety of data sources, including individual level survey data and area level auxiliary data such as census, remoteness and socioeconomic data. To assess the quality of our estimates, we used a dual validation approach whereby most SA2s are benchmarked to the sub-state level using fully Bayesian benchmarking [31], with the remaining SA2s (predominantly very remote areas) undergoing external validation. data, 69% were major cities. The SA2 sample sizes tended to be larger for outer regional to very remote areas (median of 11 and IQR of 6 to 21) than for major city areas (median of 8 and IQR of 4 to 12). SA2 level SES was sourced from the ABS Socio-Economic Indexes for Areas product [45]. Like other Australian health studies [43,3,39], we used the Index of Relative Socio-Economic Disadvantage (IRSD). The IRSD is a general SES index constructed using principal components analysis that summarises the economic and social conditions of individuals and households within a given area in order to determine the area's overall relative disadvantage. A low IRSD score indicates a large proportion of relatively disadvantaged individuals in a given SA2 [3].
In this work, we used IRSD national deciles as a categorical variable with 10 groups, where 1 represents the most disadvantaged or lowest SES group and was used as the reference group. Although the IRSD can be used as a continuous variable, it is recommended to use deciles [45], and this also gave superior model performance. There were 44 of the 2221 SA2s without an IRSD value provided, so these had the closest IRSD decile assigned according to their corresponding PC1 (principal component 1) values.
We also obtained prevalence estimates and measures of uncertainty for risky alcohol consumption (more than 2 standard drinks a day on average), adequate fruit intake, obesity, overweight, current smokers and inadequate physical activity from the SHAA [6] at the Primary Health Network (PHN) and PHA level for adults. These data were downloaded as age-standardised rates per 100 people with 95% confidence intervals. Definitions and details are available in Section C and D of the Additional File and the online SHAA platform [6].

Risk factors
Broad risk factor groups were selected by consulting three sources: a wide range of experts in the fields of public health, epidemiology and oncology; literature, specifically evidence for casual associations [46] with, and population attributable fractions [5,47,2] for, cancer incidence; and the availability of data in the 2017-18 NHS. In this work we selected the following five broad risk factor groups: tobacco smoking, alcohol, diet, weight and physical activity. According to the 2015 Australian Burden of Disease study [5] these were attributable to 22.1%, 4.5%, 4.2%, 7.8% and 2.9% of the total cancer burden, respectively. Those who reported to be current smokers (including daily, weekly or less than weekly), and had smoked at least 100 cigarettes in their life.

Risky alcohol consumption
Those who exceeded the revised 2020 National Health and Medical Research Council (NHMRC) guidelines [50] of up to 10 standard drinks/week and no more than 4 standard drinks in any day. Compliance with the guidelines were assessed using self-reported alcohol consumption during the last three drinking days from the proceeding seven days.

Diet Inadequate diet
Based on self-reported diet, those who did not meet both the fruit (2 serves/day) and vegetable (5 serves/day) 2013 NHMRC Australian Dietary guidelines [51].

Weight
Obese Those with a measured BMI greater or equal to 30. Overweight/ obese Those with a measured BMI greater or equal to 25.

Risky waist circumference
Those with a measured waist circumference measurements of ≥94cm (men) and ≥80cm (women). These cutoffs are only appropriate for adults, so we limited the survey dataset to all persons 18 years and older [6].

Physical activity
Inadequate activity (leisure) Based on self-reported leisure physical activity, those who did not meet the 2014 Department of Health (DOH) Physical Activity guidelines [52], i.e. that each week adults (those between the ages of 18 and 64) should either do 2 1/2 to 5 hours of moderate-intensity physical activity or 1 1/4 to 2 1/2 hours of vigorous-intensity physical activity or an equivalent combination of both, plus muscle-strengthening activities at least 2 days each week. The DOH guidelines also provide specific recommendations for children (5 to 17 years), older persons (65 years and older) and pregnant women. In this work, the physical activity measures were derived from the ABS created variables that accommodated the guidelines across age groups. Inadequate activity (all) Although similar to inadequate activity (leisure), this measures is based on workplace and leisure self-reported physical activity.
We explored a variety of possible measures and corresponding definitions for each of the five broad risk factor groups, placing priority on the definitions and recommendations used in the SHAA [6], the work by Whiteman et al. [2], Cancer Council Australia [48] and those provided by Australian government agencies such as Cancer Australia [3], the National Health and Medical Research Council (NHMRC), the Australian Institute of Health and Welfare (AIHW) [49] and the Australian Department of Health and Aged Care (DOH). Table 1 summarizes the five broad risk factor groups and the eight corresponding measures and definitions. Table 2 gives direct estimates for these measures stratified by the eight states and territories of Australia. The risk factor measures proposed are designed to be cross-sectional and strike a natural balance between being specific to cancer while maintaining applicability to a variety of other health conditions [5]. Note that some risk factor groups, for example weight, required several differing measures.
We defined the risk factor measures as binary where a survey individual received a value of one if they did not meet guidelines, or were in the unhealthy category. Unlike the SHAA which provides age-standardised rates by PHAs [6], we used proportions (prevalence) due to their common use in both the literature [14,53,18] and other digital Atlases [7]. Furthermore, deriving age-standardised rates requires prevalence estimates by area and age. This level of disaggregation is possible at the PHA level, but not feasible at the SA2 level. We provide further details and the motivation for the selected risk factor measure definitions in Section B of the Additional File.
3 Statistical models

Bayesian model
Given the sparse nature of the available data for this SAE analysis, we used the Bayesian two-stage logistic normal (TSLN) model we proposed recently [23]. Our previous study shows that the TSLN model can outperform commonly used area [19,20,54] and individual level [55] models both in a simulation study focusing on sparse survey data and an application using the 2017-18 NHS data. The two-stage structure of the TSLN model includes an individual level stage 1 model, followed by an area level stage 2 model. The same TSLN model, with very similar components, was chosen to be applied to all eight risk factor measures. The selection of fixed and random effect structures for the two models was guided by the goal of achieving a balance between parsimony across risk factor measures and predictive performance. We followed the advice by Goldstein [56] and initially used frequentist algorithms to select fixed and random effects, with fully Bayesian inference via Markov chain Monte Carlo (MCMC) for final model checking. Further details regarding model selection are given in Section E of the Additional File.
Let y ij ∈ {0, 1} be the binary value from the NHS for sampled individual j = 1, . . . , n i in SA2 i = 1, . . . , m, where n i is the sample size in SA2 i. Further, let m = 1694 and M = 2221 be the number of sampled and total number of SA2s, respectively. The goal of this analysis is to generate estimates of the true proportions of each risk factor measure, µ = (µ 1 , . . . , µ M ).
In this analysis, we used two versions of the survey weights, w raw ij , provided by the ABS [57,55] to correct for sampling bias and promote design-consistency. The first, w ij , was used for direct estimation and the second,w ij , was used in the stage 1 model (see Section C.1 in the Additional File). Using the survey weights, small area proportion estimates can be computed using the Hajek [58] with an approximate sampling variance of [59,54], Direct estimators, such as Eq. (1) and (2), have low variance and are design-unbiased for µ i when n i is large, but have high variance when n i is small [15].

Stage 1: Individual level model
The stage 1 model is a Bayesian pseudo-likelihood logistic mixed model. Let π ij be the probability of y ij = 1 for sampled individual j in SA2 i. Following the notation of Parker et al. [55], we represent the pseudo-likelihood for a probability density, p(.), as p (y ij )w ij . Pseudo-likelihood is used to ensure the predictions from the logistic model are approximately unbiased under the sample design [60,61]. Thus, the stage 1 model likelihood is given by, where logit (π ij ) is modelled using a generic linear predictor that is application-specific. In this work, we used several unique components summarised in Fig. 2. The linear predictor included eight individual level categorical covariates and seven area level covariates as fixed effects. Unstructured individual and SA2 level random effects were also applied. In addition to these, borrowing ideas from MrP [62], we included two hierarchical random effects based on categorical covariates that were themselves derived from the interaction of numerous individual level demographic and health covariates. A discussion of the priors used is given in Section 3.2. More details can be found in Section C of the Additional File.

Stage 2: Area level model
After fitting the stage 1 model, the individual level predictions are aggregated to the area level, producing stage 1 (S1) proportion estimatesμ The stage 2 model is a Bayesian spatial Fay-Herriot [16] model. Unlike previous two-stage approaches [28,29], we accommodate some of the uncertainty inherent in fitting the stage 1 model by using the vectorθ S1 i as input to the stage 2 model. The stage 2 model likelihood for the posterior draws from the stage 1 model is, where θ i is modelled using a generic linear predictor that is problem specific. The final proportion/prevalence estimate for the ith SA2, denoted µ i , is given by the posterior distribution of logit −1 (θ i ). To ensure that posterior uncertainty remains unaffected by the choice of T , we downscale the likelihood contribution by 1/ T .
In this work, we used several unique components for the linear predictor of θ i which are summarised in Fig. 3. The linear predictor included the SES index deciles and remoteness as standard fixed effects. In addition, PC1 to PC6 were used as fixed effects with coefficients varying according to remoteness. The linear predictor also included an external latent field constructed from the SHAA's estimates and a BYM2 spatial random effect [63] at the SA2 level. Given we did not include SA3 level census covariates, an unstructured random effect at the SA3 level was employed. To smooth unstable variances we used the generalized variance function [64,14,65] described in Section C.4.6 of the Additional File. More details can be found in Section C of the Additional File.

Priors
The Bayesian models described above are completed by the specification of priors. Given the complexity of the two models, in this work generic weakly informative priors were adopted based on preliminary analysis of the data [66]. In both models, all fixed effect coefficients were given N 0, 2 2 priors with intercepts given a student-t 0, 2 2 , df = 3 . We used N + 0, 1 2 and N + 0, 2 2 priors for all standard deviation terms in the stage 1 and stage 2 models, respectively. The mixing parameter in the BYM2 [63] random effect was given a Uniform (0, 1) prior (see Section C of the Additional File).
We conducted sensitivity analysis by using more, N 0, 1 2 , and less, N 0, 100 2 , informative priors for the fixed effects in both models. We also experimented with using exponential priors with rates of 0.5 and 1 for standard deviation terms. Finally, we examined model fit when using an informative Beta prior for the mixing parameter. We found that the model fit and prevalence estimates were unaffected by these prior changes. The chosen priors gave superior sampling efficiency and convergence.

Validation
For validation of the small area estimates, we adopted a dual approach, using both internal and external methods.

Internal benchmarking
Internal validation involved a fully Bayesian benchmarking procedure [31] that adjusts the results obtained in the stage 2 model by penalizing discrepancies between modelled and direct estimates. Unlike previous benchmarking approaches that adjust the point estimates only [15,67], Bayesian benchmarking adjusts the entire posterior -automatically accounting for benchmarking-induced uncertainty. See Section C.4.7 in the Additional File for details of Bayesian benchmarking.
In this work we simultaneously enforced two benchmarks referred to as "state" and "major-by-state". The state benchmark had seven groups which were composed of the states and territories of Australia (except the Northern Territory, which was not benchmarked due to ABS instruction [57]).
The major-by-state benchmark had 12 groups, composed of the interaction of the states and territories of Australia (except the Northern Territory) and dichotomous remoteness (major city vs non-major city). Thus, for each state, apart from Tasmania (where all areas were non-major city), and the Australian Capital Territory (where 96% of areas were major city), each SA2 was benchmarked differently depending on whether the area was in a major city or not.

External validation
External validation was performed by comparing the estimates to those from the SHAA at the PHA level and the overall trends observed in the modelled results with the general findings from other Australian health surveys conducted on specific sub-populations, such as states [68] or First Nations Australians [69]. It was not possible to match our definitions and age subset to those from the external data. Thus, comparisons were general in nature, assuming that general trends in risk factor prevalence would be evident.

Validation
The graphical summary presented in Section A of the Additional File provides an overview of the broader associations between the prevalence of cancer risk factors, the SES index, and remoteness. The figure summarises the trends available from published reports and/or digital platforms ranging across six selected Australian health surveys.
Due to the scope of the 2017-18 National Health Survey (NHS) [57], internal validation would not have been valid for very remote SA2s and those in the Northern Territory (n = 103) (see Section D.3 of the Additional File). Although these SA2s only accounted for approximately 1.5% of the 2017-2018 adult population, they were of particular interest in this work since no estimates currently exist for many of these very remote areas. Therefore, although external validation affirmed the validity and reliability of our estimates in general, it was particularly helpful in assessing the credibility of estimates for areas that could not be benchmarked.

Computation
We used fully Bayesian inference using MCMC via the R package rstan Version 2.26.11 [70]. Where possible we used the non-mean centered parameterization for random effects and the QR decomposition for fixed effects [71]. The stan code for the stage 1 and stage 2 models can be found on GitHub [72].
For the stage 1 model we used 1000 warmup and 1000 post-warmup draws for each of the four chains, feeding a random subset of 500 posterior draws from the stage 1 to the stage 2 model. For the stage 2 model we used 3000 warmup and 3000 post-warmup draws for each of four chains. For storage reasons we thinned the final posterior draws from the stage 2 model by 2, resulting in 6000 useable posterior draws.

Summaries and visualisation
Estimates from the benchmarked stage 2 model were reported in a variety of forms, including absolute, relative and classification measures. For point estimates we used posterior medians and for uncertainty intervals we used 95% highest posterior density intervals (HPDIs). We used the modelled proportions as the absolute indicator and odds ratios (ORs) as the relative indicator. The ORs for the tth posterior draw were derived as, withμ D being the overall direct estimate for the risk factor measure. An OR above one indicates that the SA2 has a prevalence higher than the national average.
In addition to using point estimates and credible intervals to summarize the ORs, we also used the exceedence probability (EP) [33,74,53].
Generally an EP above 0.8 (or below 0.2) is considered to provide evidence that the prevalence rate in the corresponding SA2 was substantially higher (or lower) than the national average, respectively [75]. Note that the exceedance probabilities calculated using either ORs or prevalence are identical.
To facilitate decision-making, we classified SA2s by assessing whether their individual and neighbor values (i.e. clusters) are significantly different to the national average. In this work, these classifications are called evidence classifications.
We followed the work by Gramatica et al. [76] and Congdon et al. [77], who used an adaption of the LISA (Local Indicator of Spatial Association) clustering approach [78] to identify clusters of areas with significantly high (or low) prevalence. The LISA clusters were computed from the posterior draws by first deriving the deviation of the prevalence from the national average, z i −μ D , and then calculating the spatial lag of the deviation, was a M -dimensional row vector and W * the row-standardized version of the adjacency neighborhood matrix described in Section C.4.4 of the Additional File.
To derive the evidence classification measure, SA2s were classified into one of four exclusive groups [77]: Any area classified as HC, H, L, or LC has an exceedance probability suggesting that the modelled prevalence is significantly different to the national average; HC or H denotes higher, while L or LC denotes lower. The difference between HC and H (or LC and L) is that the former provides an indication of clustering of areas, while the later only indicates significance of the area itself. If an area is not classified according to the criteria above (defined as None ("N")) the modelled estimate is not sufficiently different to the national average.
Code to produce subsequent plots is available on GitHub [72].

Prevalence
Large spatial variation in the proportion of cancer risk factors across Australia can be clearly observed in Figs. 4 to 6 and Section H of the Additional File. Slightly more heterogeneity of the point estimates was observed within major cities as a result of the much greater socioeconomic variation within these areas. For example, the range of principal component 1 (a proxy for SES that is unique to the SES index) was largest in major cities and inner regional areas, but 50% the size in remote and very remote areas.
Stratifying by risk factor, the results highlight interesting patterns and trends. A more thorough discussion of the result is given in Section F of the Additional File.
• Current smoking (Section H.2 in the Additional File): Spatial patterns show lower prevalence in major cities and less disadvantaged areas. Although very high prevalence estimates are observed in the very remote regions in the middle of the country, these estimates come with substantial uncertainty.
• Risky alcohol consumption (Section H.3 in the Additional File): The spatial patterns were inconsistent with the other factors, particularly in terms of the relationship between (higher) socioeconomic status and healthy behaviours. The results suggest that less disadvantaged areas have higher prevalence, which generally manifests in higher prevalence in major cities. Unlike other risk factors where prevalence estimates exhibit relative homogeneity within the SES index deciles and remoteness groups (see Section G of the Additional File), for risky alcohol consumption the estimates exhibit far greater heterogeneity for more disadvantaged areas in major cities.
• Inadequate diet (Section H.4 in the Additional File): The spatial patterns suggest less dependence on the SES index and remoteness than the other risk factors. Inadequate diet exhibits the lowest heterogeneity of the risk factors considered in this work.
• Body weight (Sections H.5 to H.7 in the Additional File): Similar spatial patterns are observed for the three measures. The prevalence was very strongly tied to remoteness with substantially lower prevalence almost exclusively occurring in major cities. Furthermore, the most notable differences in patterns between the estimates for obese and overweight/obese are found in major cities.
• Physical activity (Sections H.8 to H.9 in the Additional File): Similar spatial patterns are observed for the two measures. Lower prevalence of inadequate activity is observed in major cities and less disadvantaged areas.
The estimates demonstrate reliability, as around 97% of them possess coefficients of variation (CV) below 25%a widely accepted threshold for reliability [27]. Furthermore, the modelled estimates show considerable stability improvements over the SA2 direct estimates with a reduction in variability (measured by standard deviation) across Australia by an average factor of 3.3. The estimate uncertainty varied by risk factor, with current smoking having the highest median CV (17.4) and inadequate activity (leisure) having the lowest (2.6). The distribution of CV also varied by remoteness; the median CV for major cities (62% the survey data) was, on average, 1.8 to 3.1 times smaller than that for very remote areas.
To investigate the impact of the finer resolution, we derived PHA level CVs by taking the population weighted mean of the SA2 level estimates. The CVs of point estimates at the SA2 level range from 5% to 34% larger than point estimates at the PHA level across the risk factors. Similarly, by calculating and summarising the heterogeneity of SA2s within each PHA, we find that across the risk factors, the median PHA CV is between 1.5% to 9.1%. Of the PHAs composed of multiple SA2s, 10% have CVs > 15%. The large CVs indicate that the corresponding PHAs were highly heterogeneous, highlighting the benefits of using higher resolution estimates. Given the similar definitions for the obese risk factor measure, Fig. 7 compares the estimates used in this work and that of the SHAA.
Section G of the Additional File provides more plots describing the modelled results, including how they vary by the SES index and remoteness. An interactive exploration of the modelled results will be made available in the Australian Cancer Atlas 2.0 [79], planned for release in late 2023. Table 3 summarises the number of evidence classifications for each risk factor measure. Fig. 8 stratifies these by remoteness. A similar stratified plot for the SES index is found in Section G of the Additional File.

Evidence classifications
Across most risk factors many more HC areas are identified than LC areas. For example, for risky waist circumference, around 783 SA2s are classed as HC, while only around 397 are classed as LC. We observed that HC or H evidence classifications are generally found in the most disadvantaged areas, while L or LC areas are more likely in the least disadvantaged areas in major cities.  Table 1) for 2221 SA2s across Australia. The top plot gives the posterior median of the odds ratios (OR). ORs above 1 indicate that the prevalence is higher than the national average. The bottom plot gives the exceedance probabilities (EPs) for the ORs described in Section 3.5. The map includes insets for the eight capital cities for each state and territory, with black boxes on the main map indicating the location of the inset. Note that some values are lower (or higher) than the range of color scales shown; for these values, the lowest (or highest) color is shown. Grey areas were excluded from estimation due to the exclusion criteria described in Section 2. Black lines represent the boundaries of the eight states and territories of Australia.  Table 1). For more details see the caption for   [6]. The maps include insets for the eight capital cities in each state and territory, with black boxes indicating their location.
Note that some values are lower (or higher) than the range of color scales shown; for these values, the lowest (or highest) color is shown. Grey areas represent no estimates, and black lines denote state and territory boundaries. Our estimates and SHAA's use similar but not identical definitions, with our values reported as proportions and SHAA's as age-standardized rates converted to proportions for comparison.  The evidence classifications revealed several interesting trends. For the physical activity risk factor measures, a larger proportion of the areas in major cities were classified as HC or H as opposed to LC or L. For inadequate physical activity, the HC classifications favour the most disadvantaged areas. The weight risk factor measures exhibited different trends with a relatively even distribution of evidence classifications in major cities. Furthermore, almost all areas classified as LC or L occurred in less disadvantaged areas. As mirrored in the maps, the evidence classifications for smoking suggest a very strong correlation with remoteness and SES; almost all the LC or L classifications occur in major cities and less disadvantaged areas. Inadequate diet has the smallest number of evidence classifications (1155 out of 2221), with the largest proportion of them being LC areas in major cities and less disadvantaged areas. The results for risky alcohol consumption suggest that less disadvantaged areas have higher proportions of risky alcohol consumption; a trend unique to this risk factor measure.

Validation
Our SA2 level modelled estimates corroborated well with the trends reported from the external surveys and SHAA estimates (Section A of the Additional File). The strongest evidence of agreement was for current smoking and obese as the risk factor measure definitions from this work and those from the SHAA were similar (see Fig. 9 and Fig. 7). Moreover, the trends by socioeconomic status and remoteness from the external surveys agreed strongly with the modelled estimates.
The level of agreement with external data was generally similar for benchmarked and non-benchmarked SA2s, with non-benchmarked SA2s having considerably higher uncertainty on average as expected (see Section G of the Additional File). This was particularly true when deriving a single estimate for the non-benchmarked SA2s (see Fig. 10). Our model-based estimate of this quantity was generally very different (often higher) and had far greater uncertainty than the direct estimate which, due to the scope of the survey, is advertised as non-robust [57]. This disparity was expected as the direct estimate did not capture any very remote areas, with most data coming from the capital of the Northern Territory, Darwin. In contrast, our model-based estimate captured SA2 estimates from Darwin and very remote Australia where population health is generally poorer [80].
The fully Bayesian benchmarking approach performed well, with minimal changes in point estimates. We also observed a relatively even spread of increased and decreased posterior uncertainty, ranging from about a 13% increase to a 6% decrease in the width of HPDIs when the benchmarks were accommodated (see Section D of the Additional File).

Discussion
This work improves the spatial resolution and reach of previously published cancer risk factor estimates in Australia. While the estimates highlight broadly similar findings as those from the SHAA, they provide greater resolution and reach allowing for more granular exploration of the spatial disparities (see Fig. 7). This is particularly pertinent due to the heterogeneity of the component SA2s within each PHA in terms of population size, socioeconomic status and remoteness.
By improving the reach of the previously published cancer risk factor estimates, the estimates in this work uniquely enable the exploration of spatial disparities in very remote areas of Australia. As expected, the very remote areas have far greater uncertainty than those in major cities (CVs greater than 3 times higher). Nevertheless, by utilising the estimates and their uncertainty measures policy makers will have the capability to more effectively allocate health interventions and resources to these disadvantaged areas and triage areas where more data should be collected in the future to improve the quality of small area estimates.
The cancer risk factor estimates generated in this work reveal substantial spatial disparities in cancer risk behaviours across Australia, with higher prevalence of high risk behaviours generally occurring in more remote areas. While the prevalence of most cancer risk factors is higher in areas of lower SES, the spatial patterns for risky alcohol consumption demonstrated the opposite effect. Point estimates for risky alcohol consumption and current smoking exhibited the most heterogeneity across Australia, while those from the physical activity measures exhibit the least. The distribution of the point estimates are mostly consistent across states and territories of Australia.
Although generating prevalence estimates and their uncertainty intervals are useful in a variety of applications, using them to visualize which areas are substantially different to the national average can be difficult as the two components must be considered jointly. By further classifying the estimates according to their posterior probabilities, we were able to streamline this process. Classifications, such as those used in this work, are pivotal in developing targeted interventions as they enable policymakers to quickly identify areas, or groups of areas, with substantially higher (or lower) prevalence. Each point also has a 95% highest density interval and 95% confidence interval from this work and the SHAA, respectively. The black diagonal line represents when the two axis are equal. Note that the definitions used for our estimates and those from the SHAA are similar but not identical. Furthermore, we report proportions, while the SHAA reports age-standardised rates which have been converted to proportions for comparison.
Our Bayesian methodology, along with its associated exceedance probabilities and evidence classifications, provides insights that cannot easily be attained via the estimates from the SHAA. Although the spatial patterns of the evidence classifications (see Section 4.2) vary by risk factor, a consistent pattern was that areas with lower than average prevalence of risk factors (classified as LC or L) were almost exclusively located in major cities. Although there were areas with higher than average prevalence (HC or H) in major cities these were often less common, except for the physical activity risk factors where about half were higher and lower than the average prevalence.
Although this applied work represents a significant step in the ongoing improvements in cancer prevention in Australia, it has some limitations. Firstly and most critically, like previous research [47,81], most of the risk factor measures used were based on data derived from self-reports which are highly susceptible to various biases [82]. Furthermore, some 2017-18 NHS questions focused on behaviour from the previous week (e.g. alcohol, physical activity), while others on a usual week (e.g. fruit and vegetables consumption, smoking).
Given the nature of the survey questions, caution must be exercised in using the risk factor measures presented herein. While the estimates provide insights into the spatial variation, due to the ecological fallacy [83] and the often varying lag time between exposure (to a risk factor) and a cancer diagnosis [2], the estimates here cannot be used to establish individual-level associations between risk factors and cancer incidence. Moreover, as these estimates are derived from cross-sectional data, they do not enable inference into lifetime risky behaviour or causal relationships with cancer. Secondly, the accuracy of our estimates are conditional on the 2017-18 NHS exclusions (very remote areas, discrete Aboriginal and Torres Strait Islander communities and non-private dwellings [57]). Without data for these subpopulations, there is currently no way to assess the impact of these exclusions on modelled estimates from this survey. Thirdly, while the SHAA provides estimates by sex [6], our study, constrained by the sparsity of the survey data at the SA2 level, did not allow for a similar disaggregation. Given the evidence that health behaviours can depend on sex, the non sex-specific estimates generated in this work may suffer from inadvertent smoothing toward the mean.
Finally, the quality (both in terms of bias and variance) of small area estimates can always be improved by using larger surveys. Although we used the best survey data available, in the future, linkage of multiple surveys could provide much larger sample sizes across Australia, enabling the production of higher resolution estimates.
In terms of future research directions, one approach could involve developing distinct models for each of the eight risk factor measures. That is the linear predictor for each risk factor measure could have different sets of covariates, random effect structures or even include non-linear relationships via splines. Alternatively, future work could model the numerous risk factors jointly by leveraging univariate stage 1 models, followed by a multivariate spatial stage 2 model [84].

Conclusions
Using a Bayesian two-stage small area estimation model we have, for the first time, generated and validated point estimates of the prevalence of eight cancer risk factors, and measures of their uncertainty, at the SA2 level across Australia. By aggregating the estimates, we have shown that they are very similar to those given by the SHAA [6], external surveys [85,86,87,88,89,90] and previous research on how area level socioeconomic status and remoteness relate to healthy behaviours [43]. This work improves the spatial resolution and reach of previously published cancer risk factor estimates in Australia with the goal of informing more targeted cancer prevention strategies. To help disseminate the results of this work, they will be made available in the Australian Cancer Atlas 2.0 [79], scheduled for release in late 2023. Since the health factors used in this study are also common risk factors for other diseases, the prevalence estimates generated here may be useful in other disease modelling applications both in Australia and internationally.

Acknowledgments
This study has received ethical approval from the Queensland University of Technology Human Research Ethics Committee (Project ID: 4609) for the project entitled "Statistical methods for small area estimation of cancer risk factors and their associations with cancer incidence". Ethics approval was received for the inclusion of the modelled estimates from the 2017-18 NHS into the Australian Cancer Atlas (Griffith University Human Research Ethics Committee (EC00162) Ref:2018/052). Approval was received to access the secure ABS Datalab for analyses of the NHS data (Project ID: 2021-033 QUT).
We thank the Australian Bureau of Statistics (ABS) for designing and collecting the National Health Survey data and making it available for analysis in the DataLab. The views expressed in this paper are those of the authors and do not necessarily reflect the policy of QUT, CCQ or the ABS.

Competing interests
The authors declare that they have no competing interests.

Additional Files
The Additional File (appended to this paper) contains additional material including further details of the data and model, and more plots, maps, and results.
[9] Kamyar Mansori, Masoud Solaymani-Dodaran, Alireza Mosavi-Jarrahi, Ali Ganbary Motlagh, Masoud Salehi, Alireza Delavari, and Mohsen Asadi-Lari. Spatial inequalities in the incidence of colorectal cancer and associated factors in the neighborhoods of tehran, iran: Bayesian spatial models. A Other sources of Australian health data By evaluating the extent to which our estimates at the SA2 level aligned with those from the SHAA and the broader socioeconomic status (SES), and remoteness trends observed in external surveys, we were able to validate the credibility and generalizability of our results. In this section, we provide details about these external surveys and their appropriateness with respect to the validation procedure carried out in this work. Note that our analyses did not involve any form of record linkage.

A.1 State surveys
Although the National Health Survey is the optimal survey for this study given its national coverage and breadth of health data collected, each state and territory of Australia also independently conduct surveys for health surveillance purposes.
New South Wales (NSW) NSW, the state with the largest 2016 adult population in Australia, conducts annual population health surveys by telephone. These surveys generally aim for a sample size of 13,000 people from the state [1]. The NSW Ministry of Health provides data on cancer risk factors stratified by year, remoteness, and SES status via their interactive dashboard, HealthStats NSW [2]. As shown in Fig. 1, trends in cancer risk factors from NSW contribute a large proportion of the data we used to externally validate our modelled estimates.
Victoria (VIC) VIC, the second largest state in terms of population, also conducts annual health surveys, which are administered using computer assisted telephone interviews [3]. In 2017, the survey collected data on 33,654 adults. Unlike other states, the Victorian Agency for Health Information provides cancer risk factor prevalence estimates for small areas called Local Government Areas (LGAs) [4], which are non-ASGS boundaries that continue to be used by the Australian government. Unfortunately data are not reported by remoteness or SES.
Queensland (QLD) QLD, the second largest state in terms of population, conducts annual health surveys, which are also administered by telephone. Around 12,500 adults participate each year [5]. Although not as flexible as the NSW data system, Queensland Health provide an online tool called the Queensland survey analytic system that provides prevalence estimates for cancer risk factor by LGAs, PHNs, remoteness and SES. Although LGA estimates are available, for the regions we would be interested in -far north Queensland for example -data are not available or not releasable.
Western Australia (WA) WA also conducts annual health surveys administered using computer assisted telephone interviews. In 2018, the sample included 5,750 adults aged 16 years and over and boasted an average participation rate of approximately 90% [6]. These data would have been important for validation as 5% of the states SA2s are very remote. Unfortunately there is no publicly available dashboard summarising the Western Australian survey results. In published reports, cancer risk factor data are not stratified by small area, remoteness or SES, which limits their usefulness for validation purposes.
South Australia (SA) SA collects health data monthly via telephone interviews. In one year, around 7,000 South Australians are interviewed for the survey [7]. Unfortunately there is no publicly available dashboard summarising the SA survey results. In published reports prevalence of cancer risk factors are sporadically stratified by remoteness, and SES.
Northern Territory (NT) The first state-wide health survey in the NT was due to occur in 2022 with the goal of sampling 2000 residents [8]. At the time of writing, the most relevant health data for the state is available in the NHS [9] (6.7% of participants were from the NT) or National Aboriginal and Torres Strait Islander Health Survey [10] (18% of participants were from the NT).
Tasmania (TAS) TAS has been collecting health data triennially since 2009 via computer assisted telephone interviews. The 2019 Tasmanian Population Health Survey sampled 6300 adults. The published report presented detailed prevalence estimates by remoteness, and SES [11].
Australian Capital Territory (ACT) The ACT has been conducting an annual General Health Survey since 2007 via computer assisted telephone interviewing [12]. In 2019, the survey collected a wide range of health data from 2002 adults. Given its population (only 1.7% of the 2016 adult population) and geographical size (less than 0.03% of Australia's landmass), data from the ACT were not informative for external validation in this research. The ACT is also relatively homogeneous in terms of remoteness and SES. Of its 114 SA2s, 110 are major cities and 102 have SES index deciles above 5 (denoting lower levels of disadvantage).

A.2 Other surveys
Other nationwide surveys have health data for either a subset of the population (like the National Aboriginal and Torres Strait Islander Health Survey (NATSIHS) [10]) or a subset of variables (such as the National Drug Strategy Household Survey (NDSHS) [13]).
The NATSIHS is a hexennial survey which collects social and health data from First Nations Australians with the goal of reporting statistics at the national and remoteness level [10]. The 2015 survey collected data on 11,178 First Nations Australians living in private dwellings across Australia. These surveys collect information on all the risk factors considered in this research.
The NDSHS is the leading survey of licit and illicit drug use in Australia. In 2019, the household-based survey collected data from 22,274 people aged 14 and over [13]. Relevant to this research, the NDSHS provides data on alcohol consumption and smoking. Figure 1: Schematic summarizing the key findings from external health surveys conducted by state governments (New South Wales (NSW) [2], Queensland (QLD) [5], South Australia (SA) [7] and Tasmania (TAS) [11]) and the Australian Bureau of Statistics (The National Aboriginal and Torres Strait Islander Health Survey (NATSIHS) [14] and National Drug Strategy Household Survey (NDSHS) [13]). More details can be found in Section A and D. Each column corresponds to a specific survey, while rows represent different cancer risk factors, each with varying definitions across the surveys. Within each risk factor, we provide a summary of trends based on remoteness and socioeconomic status (SES), represented by distinct shapes. The colors of the shapes indicate the direction of the trends: red indicates a higher prevalence for remote areas compared to urban areas, and for socioeconomic disadvantage compared to socioeconomic advantage. For instance, a red triangle signifies a higher prevalence of the risk factor in more remote areas, while gray indicates insignificant or trivial differences between remoteness categories. Conversely, green denotes a lower prevalence of the risk factor in more remote areas. The schematic only includes trends supported by available data from digital platforms or publicly available reports.

B Risk factor details
This section provides further details for the eight risk factor measure definitions used in this work. A summary is provided in Table 1 in the main paper.

B.1 Smoking
According to Whiteman et al. [15] and the 2015 Australian Burden of Disease study [16] smoking contributes the highest proportion of total cancer burden in Australia. Similar to the definition used in the SHAA [17], current smoking was defined as those who reported to be daily, weekly or less than weekly current smokers, and had smoked at least 100 cigarettes in their life. Even though ex-smokers experience greater risk than non-smokers for some cancers [18], including ex-smokers in the definition would create a measure of lifetime smoking which deviates from the cross-sectional nature of the other risk factor measures.

B.2 Alcohol
The revised 2020 guidelines by the NHMRC stipulate that persons should drink no more than 10 standard drinks a week and no more than 4 standard drinks on any one day [19]. The previous 2009 NHMRC guidelines stipulated that adults should drink no more than 2 standard drinks on any day [15]. Both versions of the guidelines emphasize that the recommendations do not represent a "safe" or "no risk" level of alcohol consumption [20].
Although Cancer Australia [21] and overseas governments, such as the US [22], recommends alcohol consumption in line with the 2009 NHMRC guidelines, Cancer Council Australia [23] recommends the more recent 2020 guidelines. In this work, we defined risky alcohol consumption as persons who did not meet the 2020 guidelines.
In the 2017-18 NHS, detailed alcohol consumption data were based on self-reports. Participants were asked about their alcohol consumption for the most recent three days (from the preceding 7 days) that they drank alcohol. Results cannot indicate lifetime alcohol behaviour.

B.3 Diet
Although a variety of diet-related factors, such as fruit, vegetables, meat, fibre and wholegrains, have been found to contribute to the risk of developing cancer [24,16,15], the 2017-18 NHS collected diet information for fruit and vegetables only [25]. Participants of the survey were asked to report the number of serves of fruit and serves of vegetables they usually ate each day.
Both Whiteman et al. [15], the SHAA [17] and Cancer Australia [21] use the 2013 NHMRC guidelines [26] for fruit and vegetables, which stipulates two serves (equivalent to 300g) of fruit and five serves (equivalent to 375g) of vegetables per day.
While we acknowledge the benefits in producing separate maps for fruit and vegetable consumption, we found that only 7.5% of Australians met the guidelines for vegetable intake [27]. Modelling these sparse data could give very unstable prevalence estimates [28], a possible reason why the SHAA does not provide estimates for vegetable consumption alone.
To address this issue, we jointly modelled whether participants met the guidelines for fruit or vegetables. Specifically, we assigned a value of one, indicating inadequate diet, to persons who did not meet either guideline.

B.4 Weight
According to Whiteman et al. [15], 3.4% of cancer diagnoses in Australia are attributable to being overweight or obese. Following the SHAA, we used both weight-based risk factor measures according to the conventional Body Mass Index (BMI) [29,30], and a measure related to waist circumference [31].
BMI is calculated using a participant's height and weight. While self-reported height and weight are commonplace [32,33] and useful to approximate BMI [34,35], the reported values can be subject to bias [36]. Measurement of weight, height and waist circumference was a voluntary section of the NHS interview. To increase the applicability (reduce the number of missing values) of these weight-based measurements, any participant who refused to be measured had their weight, height and waist circumference measurements imputed using a hot-decking method where they received weight, height and waist circumference measurements from another very similar participant based on demographic characteristics, such as age, sex, state, self-perceived body mass, exercise, cholesterol and self-reported BMI [25]. Although approximately 40% of the measured data were imputed, measured values can have significantly less bias than self-reported measures [32]. In this work, we used the measured data.
Overweight/obese and obese was defined as persons with a BMI greater than or equal to 25 and 30, respectively.
Following Cancer Australia [21] and the SHAA [17], we defined risky waist circumference as measurements of 94cm and 80cm or more for men and women, respectively [20]. Note that these waist circumference cutoffs are only appropriate for adults, so for this risk factor we limit the dataset to all persons 18 years and older. Assuming that the single-year age distribution in any of the age group was uniform, we estimated that the population of 18-19-year olds was 40% of the 15-19-year old population.

B.5 Physical activity
The national Department of Health (DOH) guidelines for physical activity, published in 2014 [37], closely mirror those given by the World Health Organization [38]. The DOH guidelines stipulate that each week adults (those between the ages of 18 and 64) should either do 2 1/2 to 5 hours of moderate-intensity physical activity or 1 1/4 to 2 1/2 hours of vigorous-intensity physical activity or an equivalent combination of both. In addition, the guidelines recommend muscle-strengthening activities at least 2 days each week. The DOH guidelines also provide specific recommendations for children (5 to 17 years), older persons (65 years and older) and pregnant women.
The benefits of physical activity on overall health, including cancer, are best described by considering the total volume of activity, which is calculated by considering the frequency, duration and intensity of exercise [39]. In a systematic review of physical activity and cancer outcomes, McTiernan et al. [40] found that most studies related leisure physical activity to cancer, whereas non-leisure (or work-related) physical activity was not classed as physical activity. Elsewhere [41], physical activity has been defined as any movement that results in energy expenditure, which includes deliberate exercise or sport, incidental movement or work-related activity.
To capture both leisure only and all activity, we opt to allow workplace physical activity (if of sufficient intensity) to count toward meeting the 2014 DOH guidelines. Inadequate activity (leisure) was defined as those who did not meet the guidelines based on their reported leisure physical activity alone in the week before the survey interview. inadequate activity (all) was defined as those who did not meet the guidelines based on their reported leisure and workplace physical activity in the week before the survey interview. In this work, the physical activity measures were derived from the ABS created variables that accommodated the guidelines across age groups.

C Model details
This section provides details of the two-stage model described in Section 3 of the main paper. Figures 2 and 3 (in the main paper) provide graphically summaries of the components of the two-stage model. Initially we describe the principal components analysis that was carried out before modelling.

C.1 Survey weights
To help correct for sampling bias and promote design-consistency, survey practitioners provide survey weights. In this analysis, we used two versions of the survey weights, w raw ij , provided by the ABS [25,42].
was used for direct estimation, andw was used in the stage 1 model.

C.2 Principal components analysis
As mentioned in Section 2.2.2 in the main paper, principal components analysis was conducted on 84 continuous census covariates represented as proportions or averages. The input data were scaled and centered prior to the analysis. Principal component 1 (PC1) to PC6 were retained as they accounted for approximately 62% of the variation. PC1 to PC6 captured approximately 24%, 11%, 11%, 7%, 5% and 4% of the variation, respectively. All remaining PCs contributed less than 3% of the total variation each with 83% of these contributing less than 1%.
PC1 was very strongly correlated (Pearson correlation of 0.86) with the SES index deciles (see Fig. 2), obtained from the ABS Socio-Economic Indexes for Areas product. Despite the correlation, model performance (assessed using metrics described in Section E) was improved when both were included. Including both was pragmatic as the SES index is constructed from 16 purposely-selected census measures of relative disadvantage, while PC1 to PC6 captured a wider range of census data. Moreover, the SES index is included as a categorical variable with ten groups, whilst the PC1 to PC6 variables are included as continuous quantities. A further benefit of including the SES index (which captured 43% of the variation) is that it accommodates some information not captured by the principal components analysis including characteristics of dwellings (internet connection, number of cars, etc), disability and equivalised income [43].

C.3 Stage 1: Individual level model
In this section we describe the components of the stage 1 linear predictor, for the individual level probability of y ij = 1 for sampled individual j = 1, . . . , n i in SA2 i = 1, . . . , m. Details on how the model was selected are given in Section E.1.

Fixed effects
The fixed effects were included via the individual level design matrix, X ij , and corresponding coefficients, β. We used the following individual level categorical covariates in the stage 1 models: age, sex and their interaction; registered marital status; hypertension status; Kessler psychological distress score; occupation; language spoken at home; and depression status. See Table 1 for the categories of the individual level covariates and the reference groups used. Along with the individual covariates, we also included seven SA2 level fixed effects; SES index deciles (IRSD) as a categorical covariate with 10 groups and the first six principal components (PC) (derived from the 2016 census data, see Section C.2).
Random effects Unstructured individual level random effects (ϵ ij ) and area (SA2) level random effects (e i ) were applied. In addition to these, by borrowing ideas from MrP [44], we included two hierarchical random effects based on categorical covariates that were themselves derived from the interaction of numerous individual level demographic and health covariates. We derived a demographic-health (DH) categorical covariate from the interaction of sex, age, self-assessed health, qualification, and high school completion status. See Table 1 for the categories of these covariates. The median (IQR) samples size in each DH group was 7 (3,20 We also derived a categorical covariate (δ r[ij] with r = 1, . . . , 16 levels) from the interaction of the binary risk factor outcomes not directly associated with the risk factor being modelled. See Table 2 for more details on this covariate, termed the non-outcome risk factor (NORF) covariate. For example, when modelling waist circumference, δ = (δ 1 , . . . , δ 16 ) was constructed using risky alcohol consumption, inadequate activity (all), inadequate diet and current smoking. This variable was identical for risky waist circumference, obese and overweight/obese, while for other outcomes, overweight was one of the binary risk factors included.
The probability that y ij = 1 for sampled individual j in SA2 i is denoted by π ij . Pseudo-likelihood was used to ensure the predictions from the logistic model were approximately unbiased under the sample design [45,46]. The remaining notation is described in Section 3.1 of the main paper.
Stage 1 model The stage 1 model used for all risk factor measures is, where we represent the pseudo-likelihood for a probability density, p(.), as p (y ij )w ij [42]. Note the fixed variance for ϵ ij ; an explanation is given in Section E. Details on the priors used are given in Section 3.2 in the main paper.

C.4 Stage 2: Area level model
In this section we describe the components of the stage 2 linear predictor, for SA2 i = . . . , M , where M = 2221 is the total number of areas. Details on how the model was selected are given in Section E.2.

C.4.1 Fixed effects
The design matrix of fixed effects (Z i ) for the second stage model with corresponding coefficients (Λ) included two categorical covariates: the SES index deciles (IRSD) and remoteness. Apart from the six principal components (introduced below), no other specific census variables were found to improve model fit.

C.4.2 Fixed effects with varying coefficients
The second component of the linear predictor for θ i was the six continuous covariates, PC1 to PC6. Australia is highly decentralised, meaning that area level statistical relationships may be very different in major cities as opposed to very remote areas. To incorporate this, we allowed the fixed effect regression coefficients for PC1 to PC6 to vary according to remoteness (major cities, inner regional and outer regional to very remote).
The principal components values for SA2 i are denoted by G i , a row-vector of length six. The regression coefficients specific to the rth remoteness category for the ith SA2 are denoted by Γ r[i] , a column-vector of length six. Although we explored partial pooling for the regression coefficients, because we only used three remoteness groups we found better performance and convergence when using independent priors; treating them as fixed effects. As described in Section 3.2 in the main paper, fixed effects were given generic weakly informative priors. Thus, Γ r[i] ∼ N 0, 2 2 .

C.4.3 Estimates from the SHAA
The third component of the stage 2 linear predictor was modelled prevalence estimates from the SHAA at the PHN level (see Section D). The data from the SHAA is provided as age-standardised rates per 100 people with 95% confidence intervals. We used these data to derive proportion estimates and their associated standard errors, before transforming the estimates to the unconstrained scale.
The logistic transformed SHAA estimate and variance for SA2 i are denoted byγ i and v (γ i ), respectively. Note thatγ i andγ k (and corresponding variance) were identical if SA2 i and k were within the same PHN.
To accommodate the variance of the estimates from the SHAA, we assumed classical measurement error [47], whereby the model variance (or error) is independent of the true value. Thus, where γ i is assumed to be the true logistic transformed SHAA estimate. Note that other work models the true values using spatial priors [48,49,50]. However, because there are few PHNs compared to SA2s, we used an independent weakly informative prior, N 0, 2 2 .
The true SHAA estimate, γ i , was included as an external latent field in the linear predictor for θ i , with α (the coefficient for γ i ) controlling the influence of the external latent field on the modelled estimates. Table 4 shows which of the variables from the SHAA was used as the external latent field for each of our risk factor measures.

C.4.4 Random effects
The final components of the stage 2 model were random effects. Many studies have illustrated the benefits of accommodating the spatial structure of the small areas in SAE models [51,52,48,53,54]. Although others have used conditional autoregressive (CAR) [55] or simultaneous autoregressive (SAR) priors only, Gomez-Rubio et al. [56] argues that including a structured and unstructured random effect provides a useful compromise between accurate small area estimates and their variances.
We used the BYM2 spatial prior [57] (ζ i ) with mixing parameter, ρ ∈ [0, 1], scaling factor, κ, variance parameter, σ 2 ζ , and M × M adjacency matrix, W. The BYM2 prior is a linear combination of a unit-scale intrinsic conditional autoregressive (ICAR) prior [55] and a standard normal. The BYM2 prior, denoted as BYM2 W, κ, ρ, σ 2 ζ , is As is common in disease mapping [58], we used the binary contiguous specification for W where W ik = 1 if SA2 i and SA2 k are neighbors and zero otherwise. To reduce the complexity of our model we made several manual changes to the weight matrix.
• We ensured the neighborhood structure was fully connected [57] by treating the eastern and western SA2s at the top of Tasmania as neighbors of the furthest south SA2s in Victoria.
• Donut SA2s, or areas with only one neighbor (n = 87), were also assigned the neighbors of their neighbor. The final weight matrix assigned 93% of SA2s more than 2 neighbors.
• Although Jervis Bay is classified as an "Other Territory" by the ABS, we altered its SA2 code to include it as part of NSW.
Following the recommendations by Gomez-Rubio et al. [56], Mohadjer et al. [59] and Banerjee et al. [60], the ICAR prior for s = (s 1 , . . . , s M ) was declared for all areas and thus the s i 's for non-sampled areas are implicitly imputed during MCMC. We used the Stan implementation of the ICAR prior given by Morris et al. [61].
Given that we did not export, explore or include SA3 level census covariates, we employed a random effect at the SA3 level as well [56]. Let η h[i] be the random effect for SA3 h. We found no discernible improvement in model fit when using spatial priors at the SA3 level, so we reverted to a standard normal prior N 0, σ 2 η instead.

C.4.5 Stage 2 model
The stage 2 model used for all risk factor measures was, where the prevalence estimate for the ith SA2 was given by the posterior distribution of µ i = logit −1 (θ i ). The remaining notation and details on the priors used are given in Section 3.2 in the main paper.

C.4.6 Generalized variance functions
As discussed in our previous work [28], the stage 1 sampling variances, ψ S1 i , can be unrealistically low for unstable areas. To correct for this we used generalized variance functions (GVF), which are commonly used in area level SAE modelling to smooth or impute unstable sampling variances [62,63,64]. The GVF used in this work is a Bayesian linear model, fitted to the stable areas, and is a generalisation of the GVF used by Das et al. [65] to a fully Bayesian framework [66].
Let L be the design matrix and ω the corresponding regression coefficients for the linear model. We used the log of the SA2 sample size, the log of the SA2 population, PC1 and the posterior median ofθ S1 i as covariates. The GVF is, During MCMC we impute values for the unstable S1 sampling variances via exp L i ω + 0.5σ 2 gvf 2 [65].

C.4.7 Bayesian benchmarking
Let C D k and v C D k be the direct Hajek [67] estimate and sampling variance for benchmark k = 1, . . . , K. The goal of internal benchmarking in this work was to ensure that the population-weighted modelled estimate, was in approximate agreement with C D k . Note that S k denotes the set of SA2s in benchmark group k. Fully Bayesian benchmarking takes the form, where p > 0 was a discrepancy measure used to assert the desired level of concordance. We set p = 0.5 for both the state and major-by-state benchmarks. Note that benchmarking was not used during model selection and was only applied once a final model had been selected.

D Additional data details D.1 Population Health Areas (PHA)
There is substantial heterogeneity of the component SA2s within each PHA in terms of population size, socioeconomic status and remoteness. For example, 71 (6%) and 491 (42%) PHAs are composed of SA2s with different remoteness categories and different SES index deciles, respectively. The average standard deviation of SA2 population sizes within each PHA is 3254. The affect of aggregating these heterogeneous SA2s into PHAs is a loss of information, where the extent to the loss is somewhat dependent on the number of SA2s within each PHA. In 2016, there were 451 (39%) and 242 (21%) PHAs with 2 and more than 2 component SA2s, respectively.
D.2 Estimates from the SHAA As described in Section 2.2.3 of the main paper, we also obtained prevalence estimates and measures of uncertainty for six risk factors from the SHAA [17] at the Primary Health Network (PHN) and PHA level for adults. There are 31 PHNs across Australia, with each comprising multiple SA2s (median of 72 and IQR of 53 to 96) and varying population sizes (median 505000 and IQR of 401000 to 808000, averaged over the 2017-2018 adult population). The SHAA's estimates and their respective uncertainties were included in the models and thus each SA2 was assigned to a PHN using ABS concordance files. To obtain a single value for any SA2 that overlapped several PHNs, we took the weighted mean of the PHN estimates using the given concordance ratios. Note that the SHAA does not provide estimates for the Western Queensland PHN. Based on its similar remoteness characteristics, estimates for this PHN were copied from those from the Northern Territory PHN. Year 10 or equivalent, At most year 9 or equivalent  SHAA variable Definition Current smokers Those who were classed as current smokers. Risky alcohol consumption Those who consumed more than two standard alcoholic drinks per day on average.
Adequate fruit intake Those who consumed at least 2 serves of fruit per day on average. Obese Those with a BMI greater or equal to 30. Overweight Those with a BMI that was between 25 and less than 30. Inadequate physical activity Those who undertook low, very low or no exercise in the week prior to the survey.

D.3 Validation for very remote regions and the Northern Territory
The ABS warns that direct estimates for the Northern Territory (NT) could be inaccurate. This is because 20% of the population of the NT live in very remote or discrete First Nations Australian communities which were purposely excluded from the sampling frame [25]. According to data from 2006, the NT had the highest proportion of First Nations Australian people residing in discrete communities, 41,681 (45%) [69]. Of the 40 SA2s across Australia with populations that were composed of over 25% First Nations Australian people, 30 (75%) were very remote and 18 (45%) were in the NT, respectively.
Furthermore, given the following warning from the ABS [25], . . . the estimates from the survey, do not (and are not intended to) match estimates of the total Australian estimated resident population (which include persons living in Very Remote areas of Australia and persons in non-private dwellings, such as hotels) obtained from other sources we could not use internal validation (benchmarking) for very remote SA2s. Table 5: Descriptive statistics comparing the SA2 level prevalence estimates from the benchmarked and non-benchmarked models. The first column summarises the mean absolute relative difference (MARD) between the posterior median proportions from the benchmarked and non-benchmarked models. The second column of the table compares the width of the 95% highest posterior density intervals (HPDI) by dividing the non-benchmarked intervals sizes by the benchmarked interval sizes. Thus, a value greater than 1 indicates that the benchmarked model provides narrower HPDIs. We derived the ratio for all SA2 areas, but only display the median and interquartile range in the

E Model building
As described in Section 3.1 in the main paper, a single model specification was chosen to be applied to all eight risk factor measures.

E.1 Stage 1 model
To guide model selection for the stage 1 model, we used the smoothing ratio (SR) and area linear comparison (ALC) metrics, both of which indicate the level of concordance between the observed and smoothed data from the stage 1 model [28]. The SR and the ALC smoothing metrics achieve this by comparing the observed individual level data to the predicted probabilities and the area level direct estimates to the stage 1 estimates, respectively. Higher values of both are preferred. We used the posterior median of the SR, which for posterior draw t is given by, whereμ D is the overall prevalence. The ALC is equal to the regression coefficient when we regress the posterior median ofμ S1 i onμ D i with weights 1/ψ D i . For survey-only and census fixed effects, a frequentist weighted logistic model with no random effects was used for variable selection. The primary focus in selecting these was to maximize both the ALC and the SR, with AIC and BIC considered as secondary criteria. When validating frequentist decisions with Bayesian inference we used leave-one-out cross-validation (LOOCV) via Pareto-importance sampling [70]. In general the selected variables were consistently good predictors across all the risk factor measures. See Table 6 and 7 for model metrics for the stage 1 model for all risk factor measures.
To select the variables to include in the demographic-health categorical covariate, we explored a large range of possible interactions by fitting the resulting categorical variable using frequentist weighted logistic mixed models. The selected set of covariates showed persistent benefits, in terms of the ALC and the SR, across all risk factor measures. Similar to our previous work [28], we found that fitting the NORF categorical variable as a random effect improved model fit across the board (see Section D).
The fixed standard deviation of the residual error (σ e ) was utilized to address stage 1 models in cases where the SR or the ALC was too low. In a brief simulation experiment (See Section I) we found that optimal performance of the area level MRRMSE and MARB generally occurred when the 0.55 < ALC < 0.75 and the 0.4 < SR < 0.7, with performance atrophy outside these bounds. We found that values in the lower end of these bounds generally provided narrower and more reliable credible intervals.
As discussed in our previous work [28] and corroborated by the simulation experiment of the current research (see Section I), although the stage 1 model is designed to smooth the observed data, oversmoothing (e.g. when the SR or the ALC are close to zero) can significantly affect the model performance. Thus, the residual error scale was used to purposely overfit the first-stage model in order to improve the predictions from the second-stage model. In this work, we set σ e = 2, which provided a range of the SR from 0.37 to 0.45 and the ALC from 0.55 to 0.70 across the risk factor measures.

E.2 Stage 2 model
Unlike other SAE applications where predictive accuracy can be reasonably assessed via scoring rules [54] or leaveone-out cross validation [70], given the unstable nature of the direct SA2 level estimates, we assessed performance of our stage 2 model by comparing direct and modelled estimates at two aggregated levels; the SA4 and major-by-state benchmark level. At these levels of aggregation, the direct estimates were plausibly treated as ground truth. For example, approximately 73% and 100% of the direct smoking estimates at the SA4 and major-by-state benchmark level had coefficients of variation below 25%. Direct estimates for other risk factors exhibited similar or superior certainty.
Under the assumption that the SA4 and major-by-state benchmark level direct estimates were the truth, we used the Bayesian analogue of mean absolute relative bias (MARB) and mean relative root mean square error (MRRMSE) to assess model performance. Below we give details at the major-by-state benchmark level using the notation given in Section 3.3.1 of the main paper.
In addition to MARB and MRRMSE, we also derived the interval overlap probability (IOP) for each SA4 and major-bystate benchmark estimate. An IOP of 1 indicated that the modelled 95% highest posterior density interval (HPDI) was entirely contained within the direct estimate 95% confidence interval; the optimal situation. To summarizethe IOPs, we used the mean IOP (MIOP). Given that SA4 level direct estimates were still relatively unstable, we placed greater priority on the MIOP than the MARB and MRRMSE during model selection. See Table 8 and 9 for model metrics for the stage 2 model for all risk factor measures.    Similar to previous work in Australia [71], the prevalence of smoking shows strong spatial patterns (Figs. 11 to 13), with generally lower rates in major cities and less disadvantaged areas. That said, there are several instances where an area classed as a major city shows substantively high rates of smoking. These are predominately most disadvantaged areas in WA, ACT and NSW; some of which are industrial areas with small populations. The median of the point estimates for all very remote areas (0.33) is around 126% higher than that for the non-very remote areas (0.14), with 75% higher median CVs (30.4% compared to 17.3%). By improving the reach, we garner insights about these disadvantaged areas, albeit with higher entropy.

F.2 Alcohol
The prevalence of risky alcohol consumption shows strong spatial patterns (Figs. 14 to 16). The results suggest that less disadvantaged areas have higher proportions of risky alcohol consumption, which generally manifests in higher prevalence in major cities. This is supported by other Australian surveys [1,13] and previous research [72,17]. The effect of the inverse relationship between higher prevalence of risky alcohol consumption and lower socioeconomic disadvantage is mirrored in Figure 10 in the main paper, where the modelled estimate for the non-benchmarked areas is lower than the direct estimate.
Although the figures suggests lower than average prevalence in more disadvantaged and remote areas (e.g. the middle of Australia), the estimates for these very remote regions (such as the Northern Territory) have greater uncertainty and thus these are not substantially different to the national average (see the Additional File). Furthermore, unlike other risk factors where prevalence estimates exhibit relative homogeneity within SES and remoteness groups (see Fig. 5), for risky alcohol consumption the estimates exhibit far greater heterogeneity for more disadvantaged areas in major cities.

F.3 Diet
Unlike the other risk factors, inadequate diet (Figs. 17 to 19) tends to exhibit less dependence with SES and remoteness, which agrees with the broad findings from other Australian surveys [1,11] (see Section A). As a result, inadequate diet was one of the only risk factors for which the modelled estimate for the non-benchmarked areas was lower than the direct estimate. Unlike most of the other risk factor measures where healthy behaviour was generally reserved for major cities, areas with low prevalence of inadequate diet can be found outside major cities. For example some areas in Northern Queensland, Western New South Wales and Western Australia exhibit lower prevalence than the national average. This is supported by the new estimates for very remote areas. The median point estimates for these areas are the same as that for non-very remote areas, albeit with over three times the uncertainty (using CV).

F.4 Weight
The three weight measures (overweight/obese, obese and risky waist circumference) exhibit significant spatial variation with lower proportions in less disadvantaged and urban areas (Figs. 20 to 28). Across all three measures, the prevalence was very strongly tied to remoteness with substantially lower prevalence almost exclusively occurring in major cities. Furthermore, the most notable differences in patterns between the estimates for obese and overweight/obese are found in major cities.
Some low proportions for both obese and overweight/obese are found in the Northern Territory, which is largely composed of very remote and more disadvantaged areas. However, these areas have a high level of uncertainty and thus provide insufficient evidence of a meaningful difference. Across the three weight measures, the uncertainty (using CV) of estimates for very remote areas are, on average, over 2.5 times those for non-very remote areas.

F.5 Physical activity
Similar to our other modelled risk factors and other Australian research [73], both activity variables (Figs. 29 to 34) exhibit high spatial variation, with lower prevalence of inadequate activity in major cities and least disadvantaged areas.
One novelty of this work is estimates for inadequate activity (all) as well as inadequate activity (leisure). The most notable difference between the two was in non-remote areas.
By improving the reach of the SHAA, the modelled risk factors provide insights into the spatial disparities of physical activity in very remote areas of Australia. These areas have, on average, 7% higher prevalence but also over 2.5 times the uncertainty (using CV), compared with non-very remote areas.

G Additional plots
In this section, we provide supplemental plots and maps to those provided in the main paper.   : Choropleth maps displaying the modelled odds ratios (OR) for 2221 SA2s in Australia. ORs above 1 indicate that the prevalence is higher than the national average. The map includes insets for the eight capital cities for each state and territory, with black boxes on the main map indicating each insets' respective location. Note that some values are lower (or higher) than the range of color scales shown; for these values, the lowest (or highest) color is shown. White areas were excluded from estimation due to the exclusion criteria described in Section 2.1 of the main paper. Black lines represent the boundaries of the eight states and territories of Australia.
(a) (b) Figure 12: Choropleth maps displaying the modelled proportion (a) and width of the 95% HPDIs (b) for 2221 SA2s across Australia. The map includes insets for the eight capital cities for each state and territory, with black boxes on the main map indicating each insets' respective location. Note that prevalence values above the 99th and below the 1th percentile were assigned to the highest and lowest color, respectively. Gray areas were excluded from estimation due to the exclusion criteria described in the main paper. Black lines represent the boundaries of the eight states and territories of Australia.
(a) (b) Figure 13: Choropleth maps displaying the exceedance probabilities (a) and evidence classifications (b) for 2221 SA2s across Australia. The map includes insets for the eight capital cities for each state and territory, with black boxes on the main map indicating each insets' respective location. Gray areas were either excluded from estimation due to the exclusion criteria described in the main paper or were not classified according to one of the four categories. Black lines represent the boundaries of the eight states and territories of Australia.

I Simulation experiment
The goal of this simulation study was to explore the relationship between the performance of the TSLN model and the level of smoothing applied by the stage 1 model (Section 3.1.1 in the main paper). Following our previous work [28], here we generated a single synthetic census covering 100 areas. We repeatedly sampled from the synthetic census to obtain D = 100 unique samples (repetitions). Each unique sample had data for 60 areas, leaving the remaining 40 areas with no data. For each unique sample, we fit TSLN models with varying degrees of smoothing.

I.1 Algorithm
The synthetic census (complete once) We created a vector of area level proportions, U, with values equally spaced between 0.05 and 0.3, U = (U 1 = 0.1, . . . , U 100 = 0.4). Then we sampled a random vector of area specific population sizes, N = (N 1 , . . . , N 100 ) from the set {500, 3000}. Note that N = 100 i=1 N i . Next, using a binomial distribution we sampled the area counts, Y i ∼ Binomial(n = N i , p = U i ), which represented the number of 1's in area i. We converted the aggregated data to individual level binary data, y ij ∈ 0, 1, by assigning the binary outcome vector y 1 = (y 11 , . . . , y 1N1 ) with Y 1 1's and N 1 − Y 1 0's. Here y ij is the value for the jth individual in the ith area.
To generate an area level covariate, we first calculated the true area level proportions, µ i = 1 Ni Ni j=1 y ij . Note that µ = (µ 1 , . . . , µ 100 ) was constant for all 100 repetitions. Then, we simulated a random vector, denoted g = (g 1 , . . . , g 100 ), from N 0, 0.01 2 and calculated a continuous covariate, k * i = logit(µ i ) + g i . We also rescaled k * to a mean of zero and a standard deviation of 1. This rescaled version, denoted k, was included in the synthetic census dataset.
Using a sampling fraction of 0.4%, we calculated fixed area sample sizes, n i = round 100 60 × 0.004 × N i . Next, following Hidiroglou and You [64], we simulated z ij = I (y ij = 0) + 0.8h ij for all individuals, where h ij was a random draw from an exponential distribution with rate equal to 1. The values of z ij were used to determine each individual's sampling probability, π ij = z ij Ni j=1 z ij −1 , and sampling weight, w ij = n −1 i π −1 ij , based on the fixed area sample size. The calculation of π ij made individuals with y ij = 0 more likely to be sampled (i.e. the sampled design was informative).
The synthetic census had 168258 individuals with columns y, k, π, w, I, where I ∈ {1, . . . , 100} was an integer vector defining the area for each individual.
Repeated sampling from the synthetic census First, we selected 60 areas proportional to their population size (i.e. randomly selected areas according to Ni N ). Then, within each selected area, we drew an informative sample of size n i based on the sampling probabilities, π ij . Finally, we rescaled the sampling weights to ensure that the sum within area i equaled N i .
Although N i and n i were fixed, the areas to be sampled and which individuals were sampled in each selected area was stochastic, resulting in different n for each repetition d. The median sample size, n, and area sample size, n i was 755, and 7 respectively. Across all repetitions the median (IQR) proportion of sampled areas that gave stable direct estimates was 0.62 (0.58, 0.65). These are relatively similar to the case study presented in the main paper, whereby the highest proportion of sampled areas that gave stable direct estimates was 0.66 and the median area sample size was 8.

I.2 Models
To control the level of smoothing induced by the stage 1 model, we varied the fixed residual error, denoted by σ e . Thus, the stage 1 model was y ij ∼ Bernoulli (π ij )w ij Eq. I. we fit a stage 1 model with and without the area-level random effect. Thus, for each repetition, we fit 26 models. The stage 2 model was constant throughout this simulation study, θ S1 i ∼ N θ i ,τ S1 i + v θ S1 i 1/T Eq. I.2.2 In Eq. I.2.2 Λ 0 is the intercept and Λ 1 is the coefficient for the single area level covariate. Details for the remaining notation can be found in Section 3.1.2 in the main paper and Section C.

I.3 Performance metrics
To summarize the simulation results we compared the modelled prevalence estimates,μ = logit −1 θ i , to the true values, µ i , using mean absolute relative bias (MARB), mean relative root mean square error (MRRMSE), coverage and the width of the 95% highest posterior density intervals (HPDIs).
Letμ idt be the tth posterior draw for repetition d in area i and µ i be the true proportion in area i. Also letμ (L) id andμ (U) id denote the lower and upper bounds, respectively, of the posterior 95% HPDI for area i and repetition d. For each of the 100 repetitions, 13 σ e values and 2 area-level random effect options (in or out) we derive the performance and smoothing metrics. Thus, the results presented here are based on 2600 pairs of performance and smoothing metrics.
To assess how the performance metrics were affected by the SR and the ALC, we fit quadratic quantile regressions with the performance metric as the dependent variable and one of the SR or the ALC as the fixed and quadratic terms.
In the quantile regressions we used the 5th, 20th, 50th (median), 80th, and 95th percentiles.  Fig. 39 shows that the ALC increases to 1 faster than the SR. Hence, the ALC was preferred as a smoothing metric since it had slightly better sensitivity.

I.4 Results
As expected, the TSLN model performs better when the SR and ALC are larger. According to quantile regression on the median, by increasing the SR from 0 to 0.5, MARB and MRRMSE reduces by a factor of 2.04 and 1.32, respectively. Unlike MARB which shows a consistent downward trend (see Fig. 36), the quantile regression lines in Fig. 35 suggest a global minimum for MRRMSE. The minimum appears to occur between 0.4 and 0.7 for the SR and between 0.55 to 0.75 for the ALC, suggesting that increasing the metrics beyond these bounds may actually be ill-advised. Greater priority should be given to ensuring both metrics are close to the midpoint of 0.55. To further emphasizethis point, Table 10 shows how the fitted MRRMSE seems to stagnate for ALC between 0.4 and 0.6.
Percentile ALC = 0 ALC = 0.4 ALC = 0.5 ALC = 0.6 ALC = 0.7 50th 0. 53   The recommendation above is supported by Fig. 37 which shows that the nominal 95% level (given by the horizontal black line) is achieved when the SR = 0.41 or the ALC = 0.55. Furthermore, Fig. 38 shows that the HPDIs stay relatively constant until both the SR and ALC reach 0.5, at which point the width of the HPDIs appear to climb more steeply. Figure 35: Scatter plot visualising the relationship between the smoothing ratio (SR) (left) and area linear comparison (ALC) (right) with the mean relative root mean squared error (MRRMSE). Grey points denote the observed MRRSME and SR or ALC pairs. The five colored lines give the fitted values from univariate quadratic quantile regression using the 5th, 20th, 50th (median), 80th, and 95th percentiles. Red denotes the median quantile fitted line. The three vertical dotted lines give the global minimums of the 50th, 80th and 95th percentile fitted lines.